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Abstract. A method to calculate the average size of Davis-Putnam- 
Loveland-Logemann (DPLL) search trees for random computational prob- 
lems is introduced, and applied to the satisfiability of random CNF for- 
mulas (SAT) and the coloring of random graph (COL) problems. We 
establish recursion relations for the generating functions of the average 
numbers of (variable or color) assignments at a given height in the search 
tree, which allow us to derive the asymptotics of the expected DPLL tree 
size, 2 Nuj+o( - N \ where N is the instance size, to is calculated as a func- 
tion of the input distribution parameters (ratio of clauses per variable 
for SAT, average vertex degree for COL), and the branching heuristics. 

1 Introduction and main results. 

Many efforts have been devoted to the study of the performances of the Davis- 
Putnam-Loveland-Logemann (DPLL) procedure |19) . and more generally res- 
olution proof complexity for combinatorial problems with randomly generated 
instances. Two examples are random ^-Satisfiability (fc-SAT), where an instance 
T is a uniformly and randomly chosen set of M = aN disjunctions of fc literals 
built from N Boolean variables and their negations (with no repetition and no 
complementary literals), and random graph fc-Coloring (fc-COL), where an in- 
stance T is an Erdos-Renyi random graph from G(N,p = c/N) i.e. with average 
vertex degree c. 

Originally, efforts were concentrated on the random width distribution for 
fc-SAT, where each literal appear with a fixed probability. Franco, Purdom and 
collaborators showed that simplified versions of DPLL had polynomial average- 
case complexity in this case, see |24I14| for reviews. It was then recognized that 
the fixed clause length ensemble might provide harder instances for DPLL |12) . 
Chvatal and Szemeredi indeed showed that DPLL proof size is w.h.p. exponen- 
tially large (in N at fixed ratio a) for an unsatisfiable instance [jj. Later on, 
Beame et al. 4 showed that the proof size was w.h.p. bounded from above by 



2 c N / a (for some constant c), a decreasing function of a. As for the satisfiable 
case, Frieze and Suen showed that backtracking is irrelevant at small enough 
ratios a (< 3.003 with the Generalized Unit Clause heuristic, to be defined be- 
low) |15|. allowing DPLL to find satisfying assignment in polynomial (linear) 
time. Achlioptas, Beame and Molloy proved that, conversely, at ratios smaller 
than the generally accepted satisfiability threshold, DPLL takes w.h.p. expo- 
nential time to find a satisfying assignment |3] • Altogether these results provide 
explanations for the 'easy-hard-easy' (or, more precisely, 'easy-hard-less hard') 
pattern of complexity experimentally observed when running DPLL on random 
3-SAT instances jUl- 

A precise calculation of the average size of the search space explored by 
DPLL (and #DPLL, a version of the procedure solving the enumeration prob- 
lems #SAT and #COL) as a function of the parameters ./V and a or c is difficult 
due to the statistical correlations between branches in the search tree resulting 
from backtracking. Heuristic derivations were nevertheless proposed by Cocco 
and Monasson based on a 'dynamic annealing' assumption [11)111113] . Hereafter, 
using the linearity of expectation, we show that 'dynamic annealing' turns not 
to be an assumption at all when the expected tree size is concerned. 

We first illustrate the approach, based on the use of recurrence relations for 
the generating functions of the number of nodes at a given height in the tree, on 
the random fc-SAT problem and the simple Unit Clause (UC) branching heuristic 
where unset variables are chosen uniformly at random and assigned to True or 
False uniformly at random |7l8j . Consider the following counting algorithm 

Procedure #DPLL-UC[^,A,S] 

Call Ta what is left from instance T given partial variable assignment A; 

1. If Ta is empty, S -» S + 2 Ar ~ |A| , Return; (Solution Leaf) 

2. If there is an empty clause in Ta-, Return; (Contradiction Leaf) 

3. If there is no empty clause in Ta, let F\ — {1-clauses £ Ta}, 

if A 7^ 0, pick any 1-clause, say, £, and call DPLL[^",AU^]; (unit-propagation) 
if A = 0, pick up an unset literal uniformly at random, say, I, and call 
DPLL[T,Au£\, then DPLL[:F,ALtf] ; (variable splitting) 

End; 

^DPLL-UC, called with A — and S = 0, returns the number S of solutions 
of the instance T; the history of the search can be summarized as a search tree 
with leaves marked with solution or contradiction labels. As the instance to be 



treated and the sequence of operations done by #DPLL-UC are stochastic, so 
are the numbers L$ and Lq of solution and contradiction leaves respectively. 

k k — 1 

Theorem 1. Letk>3 and Q(t, a,k) = t + a log 2 (l - T^t^ 1 + —^~t k ) ■ The 

expectations of the numbers of solution and contradiction leaves in the #DPLL- 

UC search tree of random k-SAT instances with N variables and aN clauses 

are, respectively, L S (N, a, k) = 2 NuJs(a ' k)+o(N) with uj s (a,k) = f2(l,a,k) and 

L c (N,a,k) = 2 JVw c(a,fc)+oW with Wc f a ,k) = max f2(t,a,k). 

te[Q;l] 

An immediate consequence of Theorem 1 is that the expectation value of the 
total number of leaves, L s + L c , is 2 N " c(a ^ +o( - N \ This result was first found 
by Mejean, Morel and Reynaud in the particular case k = 3 and for ratios a > 1 
|22| . Our approach not only provides a much shorter proof, but can also be easily 
extended to other problems and more sophisticated heuristics, see Theorems 2 
and 3 below. In addition, Theorem 1 provides us with some information about 
the expected search tree size of the decision procedure DPLL-UC, corresponding 
to ^DPLL-UC with Line 1 replaced with: If Ta is empty, output Satisfiable; Halt. 



Corollary 1. Let a > a u (k), the root of toc(a, k) — 2 + alog 2 (l — 2~ k ) e.g. 
a u (3) = 10.1286.... The average size of DPLL-UC search trees for random k-SAT 
instances with N variables and aN clauses eguals 2 Nuic ^ a ^ Jr0 ^ N \ 

Functions ujs,uc are shown in Figure 1 in the k = 3 case. They coincide and 
are equal to 1 — alog 2 (8/7) for a < a* = 4.56429..., while u>c > ujs for a > a*. 
In other words, for a > a* , most leaves in =^DPLL-UC trees are contradic- 
tion leaves, while for a < a* , both contradiction and solution leaf numbers 

are (to exponential order in N) of the same order. As for DPLL-UC trees, no- 

, ,. 2 In 2 0.46209... mi . , , . . , „ 

tice that u)c{ce,k) x = . This behaviour agrees with Bcamc ct 

3 a a 

al.'s result (0(1 /a)) for the average resolution proof complexity of unsatisfiable 
instances 0]. Corollary 1 shows that the expected DPLL tree size can be esti- 
mated for a whole range of a; we conjecture that the above expression holds 

for ratios smaller than a u i.e. down to a* roughly. For generic k > 3, we have 
k-2/ 2 fe ln2 X 1 ^" 2 ) 



u>c(ct,k) x —— r — ; the decrease of uic with a is therefore 

k — 1 \k(k — 1) a J 

slower and slower as k increases. 

So far, no expression for uj has been obtained for more sophisticated heuris- 
tics than UC. We consider the Generalized Unit Clause (GUC) heuristic |8I2| 




Fig. 1. Logarithms of the average numbers of solution and contradiction leaves, 
respectively ujs and cue, in #DPLL-UC search trees versus ratio a of clauses 
per variable for random 3-SAT. Notice that ioc coincides with the logarithm of 
the expected size of #DPLL-UC at all ratios a, and with the one of DPLL-UC 
search trees for a > 10.1286.... 



where the shortest clauses are preferentially satisfied. The associated decision 
procedure, DPLL-GUC, corresponds to DPLL-UC with Line 3 replaced with: 
Pick a clause uniformly at random among the shortest clauses, and a literal, say, I, in 
the clause; call DPLL^Alrf], then DPLL^Aul]. 

Theorem 2. Define m(a: 2 ) = + \A + 4^2) — 2a; 2 , 2/3(2/2) the solution of the 
ordinary differential equation dy^jdyi = 3(1 + 2/2 — 2 j/3) / '(2m(j/2)) such that 
2/3(1) = 1, and 



co 9 (a) = max 

J<V2<1 



[ -TT lo S2 (2z+m(z)) exp (- [ -^-]+a\og 2 y 3 (y 2 ) 
J y2 m{z) \ Jz m{w)J 



(1) 

Let a > a a u = 10.2183..., the root of u 9 (a) + alog 2 (8/7) = 2. The expected size 
of DPLL-GUC search tree for random 3-SAT instances with N variables and 
aN clauses is 2 JV - 9 M+°( JV ). 

„ , , , a . . 3 + V5r, .1 + V5M2 1 0.29154.... 

Notice that, at large a, Lo y (a) x —— — — In I — - — I — = m agree- 

6 m2 L v 2 /J a a 

mcnt with the 1/a scaling established in 0|. Furthermore, the multiplicative 
factor is smaller than the one for UC, showing that DPLL-GUC is more efficient 
than DPLL-UC in proving unsatisfiability. 



A third application is the analysis of the counterpart of GUC for the random 
3-COL problem. The version of DPLL we have analyzed operates as follows pQ. 



Initially, each vertex is assigned a list of 3 available colors. In the course of the 
procedure, a vertex, say, v, with the smallest number of available colors, say, j, is 
chosen at random and uniformly. DPLL-GUC then removes v, and successively 
branches to the j color assignments corresponding to removal of one of the j 
colors of v from the lists of the neighbors of v. The procedure backtracks when a 
vertex with no color left is created (contradiction), or no vertex is left (a proper 
coloring is found). 

Theorem 3. Define uj h (c) = max \-t 2 - -t - (1 - t) In 2 + In (3 - e" 2ct/3 )l . 

3 y ' o<t<i L 6 3 v ; v n 

Let c > c„ = 13.1538..., the root of to (c) + § = 21n3. The expected size of 
DPLL-GUC search tree for deciding 3-COL on random graphs from G(N, c/N) 
with N vertices is e tf" h (<0+o(JV) . 

h, s 3 In 2 1.0397... . . , „ 

Asymptotically, ui (c) x = m agreement with Ueame et al. s 

2 c z c l 

scaling (<9(l/c 2 )) [5]- An extension of Theorem 3 to higher values of the number 

k of colors gives to h (c,k) x ^tlA r^jB^-l i/(*-3) c _ ( *_ 1)/(fc _ 2) < Thig rcsult ig 

k — 1 k — 1 

compatible with the bounds derived in (Hj, and suggests that the @(e~( fe-1 )/( fc ~ 2 )) 
dependence could hold w.h.p. (and not only in expectation). 

2 Recurrence equation for t^DPLL-UC search tree 

Let T be an instance of the 3-SAT problem defined over a set of Boolean 
variables X. A partial assignment A of length T(< A) is the specification of the 
truth values of T variables in X. We denote by Ta the residual instance given 
A. A clause c £ Ta is said to be a ^-clause with I £ {0, 1, 2, 3} if the number of 
false literals in c is equal to 3 — £. We denote by Ci(J-a) the number of ^-clauses 
in Ta- The instance T is said to be satisfied under A if Ci(Ta) = for £ — 
0, 1, 2, 3, unsatisfied (or violated) under A if Cq{Fa) > 1, undetermined under 
otherwise. The clause vector of an undetermined or satisfied residual instance T a 
is the three-dimensional vector C with components C\(T a)-,C2(T a)-,Cj,(T a)- 
The search tree associated to an instance T and a run of #DPLL is the tree 
whose nodes carry the residual assignments A considered in the course of the 
search. The height T of a node is the length of the attached assignment. 

It was shown by Chao and Franco |7I8| that, during the first descent in the 
search tree i.e. prior to any backtracking, the distribution of residual instances 



remains uniformly random conditioned on the numbers of i'-clauses. This state- 
ment remains correct for heuristics more sophisticated than UC e.g. GUC, SCi 
|8I2| . and was recently extended to splitting heuristics based on variable occur- 
rences by Kaporis, Kirousis and Lalas ^Hl- Clearly, in this context, uniformity is 
lost after backtracking enters into play (with the exception of Suen and Frieze's 
analysis of a limited version of backtracking |15p. Though this limitation ap- 
pears to forbid (and has forbidden so far) the extension of average-case studies 
of backtrack-free DPLL to full DPLL with backtracking, we point out here that 
it is not as severe as it looks. Indeed, let us forget about how #DPLL or DPLL 
search tree is built and consider its final state. We refer to a branch (of the search 
tree) as the shortest path from the root node (empty assignment) to a leaf. The 
two key remarks underlying the present work can be informally stated as fol- 
lows. First, the expected size of a #DPLL search tree can be calculated from 
the knolwedge of the statistical distribution of (residual instances on) a single 
branch; no characterization of the correlations between distinct branches in the 
tree is necessary. Secondly, the statistical distribution of (residual instances on) 
a single branch is simple since, along a branch, uniformity is preserved (as in the 
absence of backtracking) . More precisely, 

Lemma 1 (from Chao &; Franco [7 ). Let J 7 a be a residual instance attached 
to a node A at height T in a #DPLL-UC search tree produced from an instance 
T drawn from the random 3-SAT distribution. Then the set of I- clauses in Ta 
is uniformly random conditioned on its size Ci(Ta) o,nd the number N — T of 
unassigned variables for each £ € {0, 1, 2, 3}. 

Proof, the above Lemma is an immediate application of Lemma 3 in Achlioptas' 
Card Game framework which establishes uniformity for algorithms (a) 'pointing 
to a particular card (clause)', or (b) 'naming a variable that has not yet been 
assigned a value' (Section 2.1 in Ref. [2])- The operation of ^DPLL-UC along 
a branch precisely amounts to these two operations: unit-propagation relies on 
action (a), and variable splitting on (&). □ 

Lemma 1 does not address the question of uniformity among different branches. 
Residual instances attached to two (or more) nodes on distinct branches in the 
search tree are correlated. However, these correlations can be safely ignored in 
calculating the average number of residual instances, in much the same way as 



the average value of the sum of correlated random variables is simply the sum 
of their average values. 

Proposition 1. Let L(C,T) be the expectation of the number of undetermined 
residual instances with clause vector C at height T in #DPLL-UC search tree, 
andG{x\,x 2 ,x 3 \T) = x^ 1 x 2 2 x 3 3 L(C ,T) its generating function. Then, 

forO<T <N, L 

G(x u x 2 ,x 3 ; T + 1 ) = j- G{h, f 2 , f 3 ;T) + ^2- G(0, f 2 , / 3 ; T ) 

-2G(0,0,0;T) (2) 

where f\ , f 2 , f 3 stand for the functions (a?i) = Xi+^fJ,(l— 2xi), f% (xi,X2) = 
x 2 +n(xi + 1 - 2x 2 ), f 3 T \x 2 ,x 3 ) = x 3 + \[i(x 2 + 1 - 2x 3 ), and fi = 1/(N — T). 
The generating function G is entirely defined from recurrence relation 0) and 
the initial condition G(xi,x 2 ,x 3 ;0) — (x^" 1 ^ . 

Proof. Let S n denote the Kronecker function (S n = 1 if n = 0, 5 n = otherwise), 
B™ q = (™)g"(l-g) m_n the binomial distribution. Let A be a node at height T, 
and J- a the attached residual instance. Call C the clause vector of J- a- Assume 
first that C\ > 1. Pick up one 1-clause, say, I. Call Zj the number of j-clauses 
that contain I or I (for j — 1,2,3). From Lemma 1, the z/s are binomial variables 
with parameter j/{N—T) among Cj — <5j_i (the 1-clause that is satisfied through 
unit- propagation is removed). Among the Zj clauses, Wj—i contained I and are 
reduced to (j — l)-clauses, while the remaining Zj — w^-i contained I and are 
satisfied and removed. From Lemma 1 again, wj-i is a binomial variable with 
parameter 1/2 among Zj. The probability that the instance produced has no 

z - 

empty clause (wq = 0) is B^ 1 " 2 — 2~ Zl . Thus, setting = N _ T , 
M P [C',C;T] = £) Bg^ J2 B*'* f) B^ £ B&* 

W2—O 22=0 W±— 

Cl_1 c-i 1 

X X! B *i 1 ^l^ 5 C'^(C3-z 3 )5c l 2 -(C2-z 2 +w 2 )5c' 1 ~(C 1 -l-z 1 +w 1 ) 
zi=0 

expresses the probability that a residual instance at height T with clause vector 
C gives rise to a (non- violated) residual instance with clause vector C" at height 
T + 1 through unit-propagation. Assume now Ci = 0. Then, a yet unset variable 



is chosen and set to True or False uniformly at random. The calculation of 
the new vector C" is identical to the unit-propagation case above, except that: 
z\ — wo — (absence of 1-clauses), and two nodes are produced (instead of one). 
Hence, 

m uc [c,c-t] = 2 f; /?! jr B z w f 2 E Bg^ £ B z w f 2 

Z3—O W2—0 Z2=0 Wi— 

X ^C' :i -(C3-z a )Sc^-(C2-Z2+w 2 )Sc' 1 ~w 1 

expresses the expected number of residual instances at height T + 1 and with 
clause vector C produced from a residual instance at height T and with clause 
vector C through UC branching. 

Now, consider all the nodes Ai at height T, with i = 1, . . . , C. Let Oj be the op- 
eration done by #DPLL-UC on Ai. Oi represents either unit-propagation (literal 
ii set to True) or variable splitting (literals li set to T and F on the descendent 
nodes respectively). Denoting by E^(^Q the expectation value of a quantity X 

over variable Y, L(C';T+ 1) = E^^^^.j ^2 M[C; A i: Oi]j where M is the 

number (0, 1 or 2) of residual instances with clause vector C produced from Ai 
after #DPLL-UC has carried out operation Oj. Using the linearity of expectation, 

L(C; T+l)=Ec(j2 E {^,o l} (X[C"; A h 0^ = E £ (j^ M[C, Cr,T]j 
where Ci is the clause vector of the residual instance attached to Ai, and 
M[C, C; T] = (l - S Cl ) M P [C, C; T] + 6 Cl M UC [C, C; T}. Gathering assign- 
ments with identical clause vectors gives the reccurence relation L(C, T + 1) = 

^^A/[C',C;T] L(C,T). Recurrence relation J5J for the generating function is 
c 

an immediate consequence. The initial condition over G stems from the fact that 
the instance is originally drawn from the random 3-SAT distribution, £(C; 0) = 

<*C*i $C2 $C 3 ~aN- □ 



3 Asymptotic analysis and application to DPLL-UC 

The asymptotic analysis of G relies on the following technical lemma: 
Lemma 2. Let j(x2,x 3 ,t) = (1 - t) 3 x 3 + f (1 - t) 2 x 2 + §(12 - M - 2t 2 ), 

T 

with t €]0;1[ and x 2 ,x 3 > 0. Define S (T) = ^ 2 T ~ H G(0, 0, 0; H). Then, in 



the large N limit, S {[tN]) < 2 N ^ t+al °^ 7(o,o,t))+o(AT) md G{\,x 2 ,x i ;[tN]) = 

2JV(t+alog 2 7(2:2, x 3 ,t))+o(N) 

Due to space limitations, we give here only some elements of the proof. The 
first step in the proof is inspired by Knuth's kernel method [20]: when x± = |, 
/1 = 5 and recurrence relation © simplifies and is easier to handle. Iterating 
this equation then allows us to relate the value of G at height T and coordinates 
(A, #2, £3) to the (known) value of G at height and coordinates ( -g , y 2 ; ?/3 ) 
which are functions of x 2 ,Xs,T, N, and a. The function 7 is the value of j/3 
when T, iV are sent to infinity at fixed ratio t. The asymptotic statement about 
Sq(T) comes from the previous result and the fact that the dominant terms in 
the sum defining So are the ones with H close to T. 

Proposition 2. Let Lc(N, T, a) be the expected number of contradiction leaves 
of height T in the #DPLL-UC resolution tree of random 3-SAT instances with 
N variables and aN clauses, and e > 0. Then, for t G [e; 1 — e] and a > 0, 
f2(t,a,3) < — log, Lc(N, [tN], a) +o(l) < max fl(h, a, 3) where Q is defined 
in Theorem 1. 

Observe that a contradiction may appear with a positive (and non-cxponcntially 
small in N) probability as soon as two 1-clauses are present. These 1-clauses will 
be present as a result of 2-clause reduction when the residual instances include 
a large number (0(N)) of 2-clauses. As this is the case for a finite fraction of 
residual instances, G(l, 1, 1; T) is not exponentially larger than Lc(T). Use of the 
monotonicity of G with respect to x\ and Lemma 2 gives the announced lower 
bound (recognize that ft(t, a, 3) = t + alog 2 7(1, 1; t)). To derive the upper 
bound, remark that contradictions leaves cannot be more numerous than the 
number of branches created through splittings; hence Lc{T) is bounded from 
above by the number of splittings at smaller heights H, that is, G(0, 1, 1; H). 

H<T 

Once more, we use the monotonicity of G with respect to x\ and Lemma 2 to 
obtain the upper bound. The complete proof will be given in the full version. 

Proof. (Theorem 1) By definition, a solution leaf is a node in the search tree 
where no clauses are left; the average number Lg of solution leaves is thus given 

N N 

by Ls = L(0, 0, 0; H) = G(0; H). A straightforward albeit useful upper 

H=0 H=0 



bound on L$ is obtained from L$ < Sq(N). By definition of the algorithm 
^DPLL, Sq(N) is the average number of solutions of an instance with aN 
clauses over N variables drawn from the random 3-SAT distribution, Sq(N) = 
2 N (7/8) aN • This upper bound is indeed tight (to within terms that are 
subexponential in N), as most solution leaves have heights equal, or close to N. 
To show this, consider e > 0, and write 

N N 

L s > G{0;H)>2- Nt 2 N - H G{0;H) = 2- Ne S (N) [l - A] 

with A = 2 Ne S (N(l - e))/S (N). From Lemma 2, A < (k + o(l)) aN with 

K = — — = 1 - - e 2 + - e 3 < 1 for small enough e (but 0(1) with re- 
7/8 7 7 

spect to N). We conclude that A is exponential small in AT, and — e + 1 — 

alog 2 I + o(l) < jj log 2 Ls < 1 — a log 2 |. Choosing arbitrarily small e allows us 

to establish the statement about the asymptotic behaviour of Ls in Theorem 1. 

Proposition 2, with arbitrarily small e, immediately leads to Theorem 1 for 

k = 3, for the average number of contradiction leaves, Lc, equals the sum 

over all heights T — tN (with < t < 1) of L c (N,T,a), and the sum is 

bounded from below by its largest term and, from above, by N times this 

largest term. The statement on the number of leaves following Theorem 1 comes 

from the observation that the expected total number of leaves is Ls + Lc, and 

Us{a, 3) = 0(1, a, 3) < max Hit, a, 3) = wc(«, 3). □ 
te[o ; i] 

Proof. (Corollary 1) Let P sa t be the probability that a random 3-SAT instance 
with N variables and aN clauses is satisfiable. Define #L sa t and #L unsat (re- 
spectively, L sat and L unsat ) the expected numbers of leaves in #DPLL-UC (resp. 
DPLL-UC) search trees for satisfiable and unsatisfiable instances respectively. 
All these quantities depend on a and N. As the operations of #DPLL and 
DPLL coincide for unsatifiable instances, we have #L unsat = L unsat . Conversely, 
4^L sa t > L sat since DPLL halts after having encountered the first solution leaf. 
Therefore, the difference between the average sizes #L and L of #DPLL-UC and 
DPLL-UC search trees satisfies < #L-L = P sat (#L aat -L sat ) < P sat #L sa t- 
Hence, 1 - P sat #L sat /#L < L/#L < 1. Using #L sat < 2 N , P sat < 2 N {7/8) aN 
from the first moment theorem and the asymptotic scaling for #L given in The- 
orem 1, we see that the left hand side of the previous inequality tends to 1 when 
N — > oo and a > a u . □ 



Proofs for higher values of k are identical, and will be given in the full version. 



4 The GUC heuristic for random SAT and COL 

The above analysis of the DPLL-UC search tree can be extended to the GUC 
heuristic [8] , where literals are preferentially chosen to satisfy 2-clauses (if any) . 
The outlines of the proofs of Theorems 2 and 3 are given below; details will be 
found in the full version. 

3-SAT. The main difference with respect to the UC case is that the two 
branches issued from the split are not statistically identical. In fact, the literal 

1 chosen by GUC satisfies at least one clause, while this clause is reduced to a 
shorter clause when I is set to False. The cases C 2 > 1 and C2 = have also to 
be considered separately. With fi, / 2 , f 3 defined in the same way as in the UC 
case, we obtain 

G(x u x 2 ,x 3 ; T + l) = lG(/ 1 ,/ 2 ,/ 3 ;T) + fi±A_ G (0,/ 2 ,/ 3 ;T) 

h \ J 2 hJ 

+ (^A - G(0, 0, h-T)- i±£ G(0, 0, 0; T) . (3) 

V J3 J2 J J3 

The asymptotic analysis of G follows the lines of Section 3. Choosing / 2 = fi+ff 
i.e. x\ — (—1 + VI + 4x2)/2 + 0(1/N) allows us to cancel the second term on 
the r.h.s. of ©. Iterating relation Q), we establish the counterpart of Lemma 

2 for GUC: the value of G at height [tN] and argument £2, £3 is equal to its 
(known) value at height and argument 7/2, 2/3 times the product of factors j^, 
up to an additive term, A, including iterates of the third and fourth terms on 
the right hand side of ©. 2/2,2/3 are the values at 'time' r = of the solutions of 
the ordinary differential equations (ODE) dYijdT = —2m(Y2)/(l — T), dY^/dr = 
-3((l + r 2 )/2- Y3)/(1-t) with 'initial' condition Y 2 {t) = x 2 , Y 3 (t) = x 3 (recall 
that function m is defined in Theorem 2). Eliminating 'time' between Y2, 13 leads 
to the ODE in Theorem 2. The first term on the r.h.s. in the expression of uj 9 JQ) 
corresponds to the logarithm of the product of factors -Jj- between heights and 
T . The maximum over y 2 in expression JQ) for uj 9 is equivalent to the maximum 
over the reduced height t appearing in ldq in Theorem 1 (see also Proposition 2). 
Finally, choosing a > a 9 t ensures that, from the one hand, the additive term A 
mentioned above is asymptotically negligible and, from the other hand, the ratio 



of the expected sizes of #DPLL-GUC and DPLL-GUC is asymptotically equal 
to unity (see proof of Corollary 1). 

3-COL. The uniformity expressed by Lemma 1 holds: the subgraph resulting 
from the coloring of T vertices is still Erdos-Renyi-like with edge probability 
jj, conditioned to the numbers Cj of vertices with j available colors [Q. The 
generating function G of the average number of residual asignments equals (x 3 ) N 
at height T = and obeys the reccurence relation, for T < N, 

G(x!,X2,x 3 ;T + 1 ) = A G{f u f 2 , f^ T )+^~ G(0, / 2 , / 3 ; T ) 

+ (|-^)G(0,0,/ 3 ;T) (4) 

with /i = (1 - fj,)xt, / 2 = (1 - 2/i).x 2 + 2/j,xi, f 3 = (1 - 3/i)x 3 + 3^2, and 
fi = c/(3N). Choosing fx = |/ 2 i.e. xx = \xi + 0(1/N) allows us to cancel 
the second term on the r.h.s. of Iterating relation Q), we establish the 
counterpart of Lemma 2 for GUC: the value of G at height [tN] and argument 
X2, xz is equal to its (known) value at height and argument t/2j 2/3 respectively, 
times the product of factors i-, up to an additive term, A, including iterates 
of the last term in Q. An explicit calculation leads to G(^X2, x%; [tN]) = 

e Nj h (x 2 ,X 3 ,t)+0(N) + A for X2 Xz > where ^(a;^^^) = |t 2 - |t + (1 - 

t)ln(x 2 /2) + In [3 + e- 2ct / 3 (2x 2 /x 3 - 3)]. As in Proposition 2, we bound from 
below (respectively, above) the number of contradiction leaves in #DPLL-GUC 
tree by the exponential of (N times) the value of function j h in X2 = x 3 = 1 at 
reduced height t (respectively, lower than t). The maximum over t in Theorem 
3 is equivalent to the maximum over the reduced height t appearing in ojc in 
Theorem 1 (see also Proposition 2). Finally, we choose to make the additive 
term A negligible. Following the notations of Corollary 1, we use L sat < 3 N , and 
Psat < 2> N e~ Nc l Q+ °( N \ the expected number of 3-colorings for random graphs 
from G(N, c/N). 

5 Conclusion and perspectives 

We emphasize that the average #DPLL tree size can be calculated for even 
more complex heuristics e.g. making decisions based on literal degrees QH]. This 
task requires, in practice, that one is able: first, to find the correct conditioning 
ensuring uniformity along a branch (as in the study of DPLL in the absence of 



backtracking); secondly, to determine the asymptotic behaviour of the associated 
generating function G from the recurrence relation for G. 

To some extent, the present work is an analytical implementation of an idea 
put forward by Knuth thirty years ago [21111) . Knuth indeed proposed to es- 
timate the average computational effort required by a backtracking procedure 
through successive runs of the non-backtracking counterpart, each weighted in 
an appropriate way [2H - This weight is, in the language of Section II. B, simply 
the probability of a branch (given the heuristic under consideration) in #DPLL 
search tree times 2 s where S is the number of splits [11) . 

Since the amount of backtracking seems to have a heavy tail |16I17| . the 
expectation is often not a good predictor in practice. Knowledge of the second 
moment of the search tree size would be very precious; its calculation, currently 
under way, requires us to treat the correlations between nodes attached to dis- 
tinct branches. Calculating the second moment is a step towards the distant 
goal of finding the expectation of the logarithm, which probably requires a deep 
understanding of correlations as in the replica theory of statistical mechanics. 

Last of all, #DPLL is a complete procedure for enumeration. Understanding 
its average-case operation will, hopefully, provide us with valuable information 
not only on the algorithm itself but also on random decision problems e.g. new 
bounds on the sat/unsat or col/uncol thresholds, or insights on the statistical 
properties of solutions. 
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